NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improving Crowdsourcing-Based Image Classification Through Expanded Input Elicitation and Machine Learning

https://doi.org/10.3389/frai.2022.848056

Yasmin, Romena; Hassan, Md Mahmudulla; Grassel, Joshua T.; Bhogaraju, Harika; Escobedo, Adolfo R.; Fuentes, Olac (June 2022, Frontiers in Artificial Intelligence)

This work investigates how different forms of input elicitation obtained from crowdsourcing can be utilized to improve the quality of inferred labels for image classification tasks, where an image must be labeled as either positive or negative depending on the presence/absence of a specified object. Five types of input elicitation methods are tested: binary classification (positive or negative); the ( x, y )-coordinate of the position participants believe a target object is located; level of confidence in binary response (on a scale from 0 to 100%); what participants believe the majority of the other participants' binary classification is; and participant's perceived difficulty level of the task (on a discrete scale). We design two crowdsourcing studies to test the performance of a variety of input elicitation methods and utilize data from over 300 participants. Various existing voting and machine learning (ML) methods are applied to make the best use of these inputs. In an effort to assess their performance on classification tasks of varying difficulty, a systematic synthetic image generation process is developed. Each generated image combines items from the MPEG-7 Core Experiment CE-Shape-1 Test Set into a single image using multiple parameters (e.g., density, transparency, etc.) and may or may not contain a target object. The difficulty of these images is validated by the performance of an automated image classification method. Experiment results suggest that more accurate results can be achieved with smaller training datasets when both the crowdsourced binary classification labels and the average of the self-reported confidence values in these labels are used as features for the ML classifiers. Moreover, when a relatively larger properly annotated dataset is available, in some cases augmenting these ML algorithms with the results (i.e., probability of outcome) from an automated classifier can achieve even higher performance than what can be obtained by using any one of the individual classifiers. Lastly, supplementary analysis of the collected data demonstrates that other performance metrics of interest, namely reduced false-negative rates, can be prioritized through special modifications of the proposed aggregation methods.
more » « less
Full Text Available
Enhancing Image Classification Capabilities of Crowdsourcing-Based Methods through Expanded Input Elicitation

Yasmin, Romena; Grassel, Joshua T.; Hassan, Md Mahmudulla; Fuentes, Olac; Escobedo, Adolfo R. (November 2021, Proceedings of the Ninth AAAI Conference on Human Computation and Crowdsourcing (HCOMP2021))
Kamar, Ece; Luther, Kurt (Ed.)
This study investigates how different forms of input elicitation obtained from crowdsourcing can be utilized to improve the quality of inferred labels for image classification tasks, where an image must be labeled as either positive or negative depending on the presence/absence of a specified object. Three types of input elicitation methods are tested: binary classification (positive or negative); level of confidence in binary response (on a scale from 0-100%); and what participants believe the majority of the other participants’ binary classification is. We design a crowdsourcing experiment to test the performance of the proposed input elicitation methods and use data from over 200 participants. Various existing voting and machine learning (ML) methods are applied and others developed to make the best use of these inputs. In an effort to assess their performance on classification tasks of varying difficulty, a systematic synthetic image generation process is developed. Each generated image combines items from the MPEG-7 Core Experiment CE-Shape-1 Test Set into a single image using multiple parameters (e.g., density, transparency, etc.) and may or may not contain a target object. The difficulty of these images is validated by the performance of an automated image classification method. Experimental results suggest that more accurate classifications can be achieved when using the average of the self-reported confidence values as an additional attribute for ML algorithms relative to what is achieved with more traditional approaches. Additionally, they demonstrate that other performance metrics of interest, namely reduced false-negative rates, can be prioritized through special modifications of the proposed aggregation methods that leverage the variety of elicited inputs.
more » « less
Full Text Available
A machine learning platform to estimate anti-SARS-CoV-2 activities

https://doi.org/10.1038/s42256-021-00335-w

KC, Govinda B.; Bocci, Giovanni; Verma, Srijan; Hassan, Md Mahmudulla; Holmes, Jayme; Yang, Jeremy J.; Sirimulla, Suman; Oprea, Tudor I. (May 2021, Nature Machine Intelligence)
null (Ed.)
Strategies for drug discovery and repositioning are urgently need with respect to COVID-19. Here we present REDIAL-2020, a suite of computational models for estimating small molecule activities in a range of SARS-CoV-2-related assays. Models were trained using publicly available, high-throughput screening data and by employing different descriptor types and various machine learning strategies. Here we describe the development and use of eleven models that span across the areas of viral entry, viral replication, live virus infectivity, in vitro infectivity and human cell toxicity. REDIAL-2020 is available as a web application through the DrugCentral web portal (http://drugcentral.org/Redial). The web application also provides similarity search results that display the most similar molecules to the query, as well as associated experimental data. REDIAL-2020 can serve as a rapid online tool for identifying active molecules for COVID-19 treatment.
more » « less
Full Text Available

Search for: All records